AITopics | high-dimensional logistic regression

The Impact of Regularization on High-dimensional Logistic Regression

Neural Information Processing SystemsDec-25-2025, 20:42:10 GMT

Logistic regression is commonly used for modeling dichotomous outcomes. In the classical setting, where the number of observations is much larger than the number of parameters, properties of the maximum likelihood estimator in logistic regression are well understood. Recently, Sur and Candes~\cite{sur2018modern} have studied logistic regression in the high-dimensional regime, where the number of observations and parameters are comparable, and show, among other things, that the maximum likelihood estimator is biased. In the high-dimensional regime the underlying parameter vector is often structured (sparse, block-sparse, finite-alphabet, etc.) and so in this paper we study regularized logistic regression (RLR), where a convex regularizer that encourages the desired structure is added to the negative of the log-likelihood function. An advantage of RLR is that it allows parameter recovery even for instances where the (unconstrained) maximum likelihood estimate does not exist. We provide a precise analysis of the performance of RLR via the solution of a system of six nonlinear equations, through which any performance metric of interest (mean, mean-squared error, probability of support recovery, etc.) can be explicitly computed. Our results generalize those of Sur and Candes and we provide a detailed study for the cases of $\ell_2^2$-RLR and sparse ($\ell_1$-regularized) logistic regression. In both cases, we obtain explicit expressions for various performance metrics and can find the values of the regularizer parameter that optimizes the desired performance. The theory is validated by extensive numerical simulations across a range of parameter values and problem instances.

high-dimensional logistic regression, name change, regularization, (6 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression

Neural Information Processing SystemsDec-25-2025, 06:52:49 GMT

Logistic regression remains one of the most widely used tools in applied statistics, machine learning and data science. However, in moderately high-dimensional problems, where the number of features $d$ is a non-negligible fraction of the sample size $n$, the logistic regression maximum likelihood estimator (MLE), and statistical procedures based the large-sample approximation of its distribution, behave poorly. Recently, Sur and Candès (2019) showed that these issues can be corrected by applying a new approximation of the MLE's sampling distribution in this high-dimensional regime. Unfortunately, these corrections are difficult to implement in practice, because they require an estimate of the \emph{signal strength}, which is a function of the underlying parameters $\beta$ of the logistic regression. To address this issue, we propose SLOE, a fast and straightforward approach to estimate the signal strength in logistic regression. The key insight of SLOE is that the Sur and Candès (2019) correction can be reparameterized in terms of the corrupted signal strength, which is only a function of the estimated parameters $\widehat \beta$. We propose an estimator for this quantity, prove that it is consistent in the relevant high-dimensional regime, and show that dimensionality correction using SLOE is accurate in finite samples. Compared to the existing ProbeFrontier heuristic, SLOE is conceptually simpler and orders of magnitude faster, making it suitable for routine use. We demonstrate the importance of routine dimensionality correction in the Heart Disease dataset from the UCI repository, and a genomics application using data from the UK Biobank.

high-dimensional logistic regression, sloe, statistical inference, (9 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Variational empirical Bayes variable selection in high-dimensional logistic regression

Tang, Yiqi, Martin, Ryan

arXiv.org Machine LearningFeb-14-2025

Logistic regression involving high-dimensional covariates is a practically important problem. Often the goal is variable selection, i.e., determining which few of the many covariates are associated with the binary response. Unfortunately, the usual Bayesian computations can be quite challenging and expensive. Here we start with a recently proposed empirical Bayes solution, with strong theoretical convergence properties, and develop a novel and computationally efficient variational approximation thereof. One such novelty is that we develop this approximation directly for the marginal distribution on the model space, rather than on the regression coefficients themselves. We demonstrate the method's strong performance in simulations, and prove that our variational approximation inherits the strong selection consistency property satisfied by the posterior distribution that it is approximating.

approximation, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2502.10532

Genre:

Research Report > New Finding (0.72)
Research Report > Experimental Study (0.72)

Add feedback

Reviews: The Impact of Regularization on High-dimensional Logistic Regression

Neural Information Processing SystemsJan-26-2025, 09:15:30 GMT

Originality: This paper develops asymptotics theory for high-dimensional regularized logistic regression (LR). The main result of the paper (Theorem 1) is proved for any locally-Lipschitz function \Psi which then in special cases provides asymptotics for common descriptive statistics like correlation, variance, mean-squared error. Special case results for L1 and L2 regularized LR are also derived and quantities highlighted in 1 above are derived. The paper also demonstrates that the numerical simulation results align with the theoretical relations. Quality: The paper contains high quality results and proofs, the notation and setup is well defined in section 2 before the main results.

high-dimensional logistic regression, regularization, regularized lr, (4 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (0.63)
Research Report > Experimental Study (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.63)
Information Technology > Mathematics of Computing (0.39)

Add feedback

Reviews: The Impact of Regularization on High-dimensional Logistic Regression

Neural Information Processing SystemsJan-26-2025, 09:15:19 GMT

The authors study the limiting distribution of certain functionals of the penalized maximum likelihood estimator in regression. The paper contains nontrivial new extensions of the work of Sur and Candes in the unpenalized case, and is well-written and interesting. The reviews were mostly positive and the paper is in good shape.

high-dimensional logistic regression, regularization

Neural Information Processing Systems

Genre:

Research Report > New Finding (0.40)
Research Report > Experimental Study (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.40)

Add feedback

SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression

Neural Information Processing SystemsJan-19-2025, 14:07:46 GMT

Logistic regression remains one of the most widely used tools in applied statistics, machine learning and data science. However, in moderately high-dimensional problems, where the number of features d is a non-negligible fraction of the sample size n, the logistic regression maximum likelihood estimator (MLE), and statistical procedures based the large-sample approximation of its distribution, behave poorly. Recently, Sur and Candès (2019) showed that these issues can be corrected by applying a new approximation of the MLE's sampling distribution in this high-dimensional regime. Unfortunately, these corrections are difficult to implement in practice, because they require an estimate of the \emph{signal strength}, which is a function of the underlying parameters \beta of the logistic regression. To address this issue, we propose SLOE, a fast and straightforward approach to estimate the signal strength in logistic regression.

correction, high-dimensional logistic regression, sloe, (7 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

The Impact of Regularization on High-dimensional Logistic Regression

Neural Information Processing SystemsOct-10-2024, 17:27:40 GMT

Logistic regression is commonly used for modeling dichotomous outcomes. In the classical setting, where the number of observations is much larger than the number of parameters, properties of the maximum likelihood estimator in logistic regression are well understood. Recently, Sur and Candes \cite{sur2018modern} have studied logistic regression in the high-dimensional regime, where the number of observations and parameters are comparable, and show, among other things, that the maximum likelihood estimator is biased. In the high-dimensional regime the underlying parameter vector is often structured (sparse, block-sparse, finite-alphabet, etc.) and so in this paper we study regularized logistic regression (RLR), where a convex regularizer that encourages the desired structure is added to the negative of the log-likelihood function. An advantage of RLR is that it allows parameter recovery even for instances where the (unconstrained) maximum likelihood estimate does not exist.

high-dimensional logistic regression, maximum likelihood estimator, regularization, (3 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

The Impact of Regularization on High-dimensional Logistic Regression

Salehi, Fariborz, Abbasi, Ehsan, Hassibi, Babak

Neural Information Processing SystemsMar-19-2020, 01:32:40 GMT

Logistic regression is commonly used for modeling dichotomous outcomes. In the classical setting, where the number of observations is much larger than the number of parameters, properties of the maximum likelihood estimator in logistic regression are well understood. Recently, Sur and Candes \cite{sur2018modern} have studied logistic regression in the high-dimensional regime, where the number of observations and parameters are comparable, and show, among other things, that the maximum likelihood estimator is biased. In the high-dimensional regime the underlying parameter vector is often structured (sparse, block-sparse, finite-alphabet, etc.) and so in this paper we study regularized logistic regression (RLR), where a convex regularizer that encourages the desired structure is added to the negative of the log-likelihood function. An advantage of RLR is that it allows parameter recovery even for instances where the (unconstrained) maximum likelihood estimate does not exist.

high-dimensional logistic regression, maximum likelihood estimator, regularization, (3 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression

Candes, Emmanuel J., Sur, Pragya

arXiv.org Machine LearningApr-25-2018

This paper rigorously establishes that the existence of the maximum likelihood estimate (MLE) in high-dimensional logistic regression models with Gaussian covariates undergoes a sharp `phase transition'. We introduce an explicit boundary curve $h_{\text{MLE}}$, parameterized by two scalars measuring the overall magnitude of the unknown sequence of regression coefficients, with the following property: in the limit of large sample sizes $n$ and number of features $p$ proportioned in such a way that $p/n \rightarrow \kappa$, we show that if the problem is sufficiently high dimensional in the sense that $\kappa > h_{\text{MLE}}$, then the MLE does not exist with probability one. Conversely, if $\kappa < h_{\text{MLE}}$, the MLE asymptotically exists with probability one.

artificial intelligence, machine learning, mle, (16 more...)

arXiv.org Machine Learning

1804.09753

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (0.49)
Research Report > Experimental Study (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.71)

Add feedback

Filters

Collaborating Authors

high-dimensional logistic regression

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

The Impact of Regularization on High-dimensional Logistic Regression

SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression

Variational empirical Bayes variable selection in high-dimensional logistic regression

Reviews: The Impact of Regularization on High-dimensional Logistic Regression

Reviews: The Impact of Regularization on High-dimensional Logistic Regression

SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression

The Impact of Regularization on High-dimensional Logistic Regression

The Impact of Regularization on High-dimensional Logistic Regression

The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression